0.0.1 RNA sequencing data from project

Fastq files from RNA sequencing data in this dataset were processed using standardized processing tools and parameters as listed in this link. The raw counts of transcripts identified in the files were analyzed as follows:

 

0.0.2 Principal component analysis of all samples:

First a principal component analysis was performed to see if there were evident clustering of non-immortalized and immortalized cell lines. The non-immortalized and immortalized cell lines were found to be uniformly distributed. The PC1 and PC2 only accounted for 18-21% of variance in the samples suggesting that the samples were not vastly different from each other. Since most of the samples were cNF samples, this suggests that no significant differences were noted between the different cNF samples. Since there were only 3 pNF samples, this analysis was underpowered to capture any differences between cNF and pNF samples.

 

0.0.3 Differential gene expression analysis in cNF vs icNF samples:

TMM normalization was used to account for library sizes and prepare the dataset for differential gene expression analysis. We then used limma-edgeR based analysis to find differentially expressed genes in the TMM normalized dataset.

 

The volcano plot below shows genes that are significantly overexpressed in cNF compared to icNF with a log fold change > 2 in red. The dots in blue refer to genes that are significantly underexpressed in icNF compared to cNF with a log fold change < -2. The thresholds of fold change have been arbitrarily chosen for ease of visualization in the volcano plot shown below.

 

The heatmap below the volcano plot shows the expression differences of all significantly differentially expressed genes between cNF and icNF samples. The cluster tree shows splitting of the differentially expressed genes into 4 main clusters according to expression patterns in the comparison groups.

 

0.0.4 Pathway analysis

The clusters of genes identified above were then subjected to pathway analysis using gprofiler2 which uses hypergeometric test to examine enrichment of particular cellular pathways in the gene lists provided.

The Manhattan plots show the enrichment scores (logPvalues) for pathways in various databases (e.g. KEGG, REAC, GO:MF, GO:CC etc) that represent the genes in each of the identified clusters (Clusters 1, 2, 3, and 4).

The table below each Manhattan plot shows the list of pathways enriched in specific clusters along with the pValue of enrichment.

 

0.0.4.1 All significantly differentially expressed genes:

## [1] "Pathway enrichment among differentially expressed genes"

 

0.0.4.2 Subclusters of differentially expressed genes:

## [1] "Enrichment Plot of Cluster 1 genes"
## [1] "Enrichment Plot of Cluster 2 genes"
## [1] "Enrichment Plot of Cluster 3 genes"
## [1] "Enrichment Plot of Cluster 4 genes"